Event-based Clustering for Reducing Labeling Costs of Incident-Related Microposts
نویسندگان
چکیده
Automatically identifying the event type of event-related information in the sheer amount of social media data makes machine learning inevitable. However, this is highly dependent on (1) the number of correctly labeled instances and (2) labeling costs. Active learning has been proposed to reduce the number of instances to label. Though, current approaches focus on the thematic dimension, i.e., the event type, for selecting instances to label; other metadata such as spatial and temporal information that is helpful for achieving a more fine-grained clustering is currently not taken into account. Also, labeling quality is always assumed to be perfect as currently no qualitative information is present for manual event type labeling. In this paper, we present a novel event-based clustering strategy that makes use of temporal, spatial, and thematic metadata to determine instances to label. Furthermore, we also inspect the quality of the manual labeling in a crowdsourcing study by comparing experts and non-experts. An evaluation on incident-related tweets shows that (i) labels provided by crowdsourcing are of acceptable quality and (ii) our selection strategy for active learning outperforms current state-of-theart approaches even with few labeled instances. Proceedings of the 2 International Workshop on Mining Urban Data, Lille, France, 2015. Copyright c ©2015 for this paper by its authors. Copying permitted for private and academic purposes.
منابع مشابه
Event-Based Clustering for Reducing Labeling Costs of Event-related Microposts
Automatically identifying the event type of event-related information in the sheer amount of social media data makes machine learning inevitable. However, this is highly dependent on (1) the number of correctly labeled instances and (2) labeling costs. Active learning has been proposed to reduce the number of instances to label. Albeit the thematic dimension is already used, other metadata such...
متن کاملEdge pair sum labeling of some cycle related graphs
Let G be a (p,q) graph. An injective map f : E(G) → {±1,±2,...,±q} is said to be an edge pair sum labeling if the induced vertex function f*: V (G) → Z - {0} defined by f*(v) = ΣP∈Ev f (e) is one-one where Ev denotes the set of edges in G that are incident with a vertex v and f*(V (G)) is either of the form {±k1,±k2,...,±kp/2} or {±k1,±k2,...,±k(p-1)/2} U {±k(p+1)/2} according as p is even or o...
متن کاملEstimation cost of occupational accidents by SACA method: A case study in one of the South Pars Gas Refinery
Introduction: Occupational accidents impose a lot of direct and indirect costs on the national economy of the country. Because, as a fact, resources are limited for reducing the risks and costs can affect the optimal investment in safety issues, the aim of this study was to calculate the economic costs of occupational accidents in one of the South Pars Gas Refinery, Assalouyeh, Bo...
متن کاملA Clustering Based Location-allocation Problem Considering Transportation Costs and Statistical Properties (RESEARCH NOTE)
Cluster analysis is a useful technique in multivariate statistical analysis. Different types of hierarchical cluster analysis and K-means have been used for data analysis in previous studies. However, the K-means algorithm can be improved using some metaheuristics algorithms. In this study, we propose simulated annealing based algorithm for K-means in the clustering analysis which we refer it a...
متن کاملth Workshop on Making Sense of Microposts ( # Microposts 2015 ) Big things
Detecting events using social media such as Twitter has many useful applications in real-life situations. Many algorithms which all use different information sources—either textual, temporal, geographic or community features—have been developed to achieve this task. Semantic information is often added at the end of the event detection to classify events into semantic topics. But semantic inform...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015